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Consider  a  heteroscedastic  regression  model,  in  which  we  observe  N  pairs 
(y,x)  following  the  model 


E  ( y  |  x )  =  f(x,/3)  ; 

Standard  Deviation(yjx)  =  ag(x,p.9) 


While  our  remarks  hold  generally,  in  what  follows  it  suffices  to  consider  the 
special  case  of  linear  regression  for  the  mean  and  the  power  of  the  mean  model 
for  the  standard  deviation,  i.e., 


flx.fi)  =  fiQ  -  fitx 


glx.fi, 9)  =  tix.fi) 


When  9  =  0 ,  we  have  the  homoscedastic  regression  model,  and  unweighted  least 
squares  will  ordinarily  be  used  to  estimate  fi .  For  other  values  of  9, 
generalized  least  squares  can  be  used  to  estimate  fi,  see  Carroll  &  Ruppert 
(1987)  for  a  discussion  and  a  review  of  the  literature.  Generalized  least 
squares  is  weighted  least  squares  with  estimated  weights.  The  version  of 
generalized  least  squares  used  here  for  each  9  is  fully  iterated  reweighted 
least  squares,  sometimes  called  quasi-likelihood,  see  McCullagh  &  Nelder 
(1983).  In  practice,  9  is  unkown  and  must  be  estimated.  The  theory  of  such 
estimation  is  given  by  Davidian  &  Carroll  (1986). 

The  common  folklore  theorem  of  generalized  least  squares  states  that  as 
long  as  one's  estimate  9  of  9  is  root-N  consistent,  the  resulting  generalized 
least  squares  estimate  has  the  same  asymptotic  distribution  as  if  9  were  known. 
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See  Judge,  et  al  (1985)  and  Carroll  &  Ruppert  (1982,  1987)  for  references  and 
proofs.  Indeed,  any  generalized  least  squares  estimate  has  the  same  limit 
distribution  as  weighted  least  squares  based  on  the  correct  weights,  i.e.,  the 
inverse  of  the  square  of  (1.2). 

The  folklore  theorem  has  an  analogue  in  practice.  In  the  linear 
regression  model  with  a  reasonably  sized  data  set,  since  unweighted  least 
squares  is  consistent  its  fitted  values  rarely  differ  much  from  the  fitted 
values  from  a  generalized  least  squares  fit.  Consequently,  the  usual  practice 
is  to  treat  the  estimation  of  the  variance  function  g(x./3,0)  fairly  cavalierly, 
if  at  all.  To  quote  Schwartz  (1979).  "there  is  one  point  of  agreement  among 
statistics  texts  and  that  is  the  minimal  effect  of  weighting  factors  on  fitted 
regression  curves.  Unless  the  variance  nonuniformity  is  quite  severe,  the 
curve  fitted  to  calibration  data  is  likely  to  be  nearly  the  same,  whether  or 
not  the  variance  nonuniforming  is  included  in  the  weighting  factors".  The 
narrow  focus  on  estimating  the  mean  is  misplaced,  as  Schwartz  later  notes,  see 
also  Garden,  et  al  (1980'.  Sometimes  the  variance  function  is  itself  of 
importance.  Box  &  Meyer  (1985)  state  that  "one  distinctive  feature  of  Japanese 
quality  control  improvement  techniques  is  the  use  of  statistical  experimental 
design  to  study  the  effect  of  a  number  of  factors  on  variance  as  well  as  the 
mean".  Other  times  the  variance  function  essentially  determines  the  quantity 
of  interest.  This  occurs,  for  example,  in  the  estimation  of  the  sensitivity  of 
a  chemical  or  biochemical  assay,  see  Carroll,  Davidian  &  Smith  (1986). 
However,  there  are  even  more  basic  problems  where  the  variance  function  is  of 
considerable  importance,  namely  prediction  and  calibration. 

It  is  perhaps  trite  to  state  that  how  well  one  estimates  the  variance 
function  has  a  large  effect  on  how  well  one  can  do  prediction  and  calibration. 
It  is.  however,  a  point  that  is  rarely  taken  into  account  in  practice,  as  any 


;V_  . 


review  of  the  (rudimentary)  techniques  in  the  assay  literature  will  quickly 
show.  There  are  two  ways  to  see  this  point.  The  first  is  through  an 
asymptotic  theory  outlined  in  section  3.  where  we  show  that  the  difference  in 
the  length  of  a  prediction  interval  between  9  known  and  unknown  is 
asymptotically  distributed  with  variance  a  monotone  function  of  how  well  one 
estimates  9  .  The  second  and  probably  more  useful  way  to  see  the  effect  of 
variance  function  estimation  is  through  an  example.  The  large  costs  involved 
in  not  weighting  at  all  will  be  evident  in  this  example,  and  will  serve  as  an 
object  lesson. 


Calibration  experiments  start  with  a  training  or  calibration  sample 

(  y  ^  ,  Xj  ) . and  then  fit  models  to  the  mean  and  variance  structures. 

The  real  interest  lies  in  an  independent  pair  (y^x^).  Sometimes  is  known 
and  we  wish  to  obtain  confidence  intervals  for  yQ ;  this  is  prediction.  Other 
times.  yQ  is  easily  measured  but  Xg  is  unknown  and  inference  is  to  be  made 
about  it,  see  Rosenblatt  &  Spiegelman  (1982). 

For  example,  in  an  assay  x  might  represent  the  concentration  of  a 
substance  and  y  might  represent  a  counted  value  or  intensity  level  which  varies 
with  concentration.  One  will  have  a  new  value  y^  of  the  count  or  intensity  and 
wish  to  draw  inference  about  the  true  concentration  x^ .  The  calibration  sample 
is  drawn  so  that  we  have  a  good  understanding  of  how  the  reponse  varies  as  a 


function  of  concentration.  The  regression  equation  relating  the  response  to 
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I 


concentration  is  then  inverted  to  predict  the  concentration  from  the  observed 
response . 

For  the  remainder  of  this  section  we  will  assume  that  the  responses  are 
normally  distributed,  although  this  can  be  relaxed.  Given  a  value  Xg ,  the 
standard  point  estimate  of  the  response  yQ  is  ffx^./J).  Let  fi  be  a  generalized 
least  squares  estimate,  and  define 


SG  -  s  <.)  -  s'1  i  t  tx  A  )  yxrVT  /  flXji,.)29 
1  =  1 
N 

?  — 1  2  20 

o  (•)  =  n  r  (y,  -  f(x.  ./?„)>  /  f  lx  fi) 

.  .1  lb  lb 

1  =  1 


where  f  is  the  derivative  of  f  with  respect  to  p> .  For  large  calibration  data 
sets,  the  variance  in  the  error  made  by  prediction  is 


*  2  2 

Variance{yn  -  f  ( Xj^  ,/3  )  >  a  o  q^iXg./l.e).  where 


qN(xn  /5'e )  =  s2(xn^-0) 

*  S"1 


Note  that  if  the  size  N  of  the  calibration  data  set  is  large,  then  the  error  in 
prediction  is  determined  predominately  by  the  variance  function 


O2  g2(f  (xn  P)  .Xo,0  ) 


and  not  by  the  calibration  data  set  itself.  An  approximate  (l-a)lOO^ 
confidence  interval  for  the  response  yQ  is  given  by 


mm® 


5 


(2.2)  IfXg)  =  (aH  values  y  in  the  interval 

f  ( x_  ,/3  _  )  ±  t“*  P,„  o_  q„(x_  ,/}_  ,0  )  }  , 
a  G  l-a/2  G  N  □  G 

N-p 

where  t^_^  2  is  the  (l-a/2)  percentage  point  of  a  t-distribution  with  N-p 
degrees  of  freedom.  For  large  sample  sizes,  this  interval  becomes 


(2.3) 


I<V 


{all 


y  in  the  interval 

f<x0-^G»  1  *1^/2  °G  s(f,xn'V-’W9)l 


The  prediction  interval  (2.2)  is  only  an  approximate  (l-a)100%  confidence 

interval  because  the  function  q^  is  not  known  but  rather  must  be  estimated. 

The  effect  of  ignoring  the  heterogeneity  can  be  seen  through  examination 
*  2 

of  (2.3).  If  a  is  the  unweighted  mean  squared  error,  then  for  large  samples 

ij 

we  have  the  approximation 


22  2  -1  _N  2,  „ 

a  g  =0  N  Z  .  „  g  (x.,/5,9) 
&mean  i=l  6  l  r 


Thus  the  unweighted  prediction  interval  for  large  sample  sizes  is  approximately 


(2.4)  I  l  (  Xj-j  )  s  {  all  y  in  the  interval 

f(xn’^V  *  t'l^/2  °  gmean  *  ' 

Comparing  (2.3)  and  (2.4)  we  see  that  where  the  variability  is  small,  the 
unweighted  prediction  interval  will  be  too  long  and  hence  pessimistic,  and 
conversely  where  the  variance  is  large. 

Now  suppose  that  we  are  given  the  value  of  the  response  and  wish  to 

estimate  and  make  inference  about  the  unknown  Xj^ .  The  estimate  of  Xg  is  that 


value  which  satisfies  nXg.Pg)  =  yQ.  The  most  common  interval  estimate  of  x^ 
is  the  set  of  all  values  x  for  which  yQ  falls  in  the  prediction  interval  I(x), 


Calibration  interval  for  x^  =  {  all  x  such  that  yQ  e  I ( x ) 

where  I(x)  is  given  by  (2.3)  }. 

The  effect  of  not  weighting  is  too  long  and  pessimistic  confidence  intervals 
for  xQ  where  the  variance  is  small  and  the  opposite  where  the  variance  is 
large.  As  far  as  we  know,  little  work  has  been  done  to  determine  whether  one 
can  shorten  the  calibration  confidence  interval  by  making  more  direct  use  of 
the  variance  function. 


Assume  throughout  that  the  data  are  symmetrically  distributed  about  their 

mean.  Let  p  be  any  generalized  least  squares  estimate  of  p  based  on  an 

estimate  of  9,  call  it  9  say.  Davidian  &  Carroll  (1986)  introduce  a  class  of 

estimators  which  depend  on  the  data  only  through  pn  ,  the  design  {x.},  and 

b  1 

either  sample  variances  from  replicates  at  each  design  point  or  on 
transformations  of  the  squared  residuals 


<*1  -  f<VV> 


This  class  of  estimators  includes  most  methods  in  the  literature,  see  Judge,  et 
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al  (1985).  Davidian  &  Carroll  (1986)  show  that  all  members  of  their  class  of 
estimators  have  the  asymptotic  expansion 

(3.1)  N1/2(9  -  0)  =  WM  +  aT  N1/2(/5  -  fi)  +  o  (l)  . 

N  G  p 

In  (3.1),  a  is  a  fixed  vector  and  is  asymptotically  normally  distributed. 
Because  the  observations  have  symmetric  distribution.  is  asmptotical ly 

uncorrelated  with  p 

G 

Let  8  (0)  and  /3_(0)  be  generalized  least  squares  estimates  of  p  with  0 
b  b 

known  and  unknown  respectively,  and  let  o(0)  and  0(0)  be  the  correspnding 
estimates  of  o.  The  length  of  the  prediction  intervals  with  0  known  and 
unknown  are  proportional  to  L(0)  and  L(0)  respectively,  where 

L(0l  =  0(0)  qN(xD,/OG(0  )  ,0)  . 

The  random  variable 

AL  =  N1/2{  L (0  )  -  L (0  )  }  /  o 

describes  how  well  one  approximates  the  length  one  would  use  if  0  were  known. 
Intuitively,  we  would  like  AL  to  have  smallest  possible  variability. 

THEOREM  :  Suppose  that  in  (3.1)  is  asymptotically  normally  distributed  with 
mean  zero  and  covariance  C  =  C ( 0 )  depending  on  the  method  of  estimating  0. 
Then,  under  regularity  conditions,  AL  is  asymptotically  normally  distributed 
with  variance  an  increasing  function  of  C(0). 
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NOTE  The  Theorem  remains  valid  if  AL  is  the  normalized  difference  in  length 
between  the  interval  with  0  unknown  and  the  interval  with  completely  specified 
variance  function. 


PROOF  (Sketch)  :  It  is  easily  seen  that  in  the  definition  of  L  we  may  replace 
q  by  g.  Further,  AL  =  A  -  A  +  A  ,  where 

» *  1  u  O 


A 

A 


2 

3 


A1  =  N1/2g(xn,/3(e),9)  {o  (9)  -  0(9  ))/o 
N1''2  {<3(9  )  /a)  ,/3(0  )  .0  )  -  g(xQ,/3(9)  ,9)) 

N1/2{0(9)/0}  {e(Xa,/l(9  )  ,9  )  -  gt^./JO  )  ,9  )} 


P 

Now,  A ^  0  since,  from  Carroll  &  Ruppert  (1987),  we  have  that 


n1/2< 


p(9)  -  p(9)  }  /  a 


By  a  Taylor  series. 


A2  =  N1/2  gx(x0,/J,9)  (9  -  9)  -  o  (!)  . 

Lemma  A. 3  of  Carroll,  Davidian  &  Smith  shows  that  for  some  constant  b(xa), 

1/2 

A,  =  b(  )  N  (0  -  9}  -  o(l)  . 


This  shows  that  for  some  constant  c(x^). 


1/2 

AL  =  c(Xq)  N  (0  -  0 )  *  op ( 1 ) 


The  proof  is  completed  by  applying  (3.1) 


4 .  :  AN,  EXAMPLE 

In  Chapter  2.  section  8,  Carroll  &  Ruppert  (1987)  present  the  results  of 
an  assay  for  the  concentration  of  an  enzyme  (esterase).  There  were  113 
observations,  of  which  5  were  deleted.  The  observed  concentration  of  esterase 
was  recorded  and  then  a  binding  experiment  was  undertaken,  so  that  the  response 
is  the  count  of  the  number  of  bindings.  These  data  were  given  to  us  by  another 
statistician  and  we  are  unable  to  give  further  detail  into  the  background  of 
the  experiment.  We  do  not  know  wnether  the  recorded  concentration  of  esterase 
has  been  accurately  measured,  although  we  will  assume  it  has  been  and  that 
there  is  little  if  any  measurement  error  in  this  predictor  The  lack  of 
replicates  in  the  reponse  is  rather  unusual  in  our  experience.  Since  the 
response  is  a  count,  one  might  expect  Poisson  variation,  i.e..  the  power  of  the 
mean  model  holds  with  9  =  0.50.  In  our  experience  with  assays,  such  a  model 
almost  always  underestimates  9,  with  values  between  0.60  and  0.90  being  much 
more  common:  see  Finney  (1976)  and  Raab  (1981a). 

The  eventual  goal  of  the  study  is  to  take  observed  counts  and  infer  the 
concentration  of  esterase,  especially  for  smaller  values  of  the  latter.  As  is 
typical  in  fhese  experiments,  a  calibration  or  training  data  set  is  taken  for 
which  the  predictor  variable  esterase  is  known  as  is  the  counted  response. 
Carroll  $  Ruppert  (1982)  plot  the  data,  which  appears  reasonably  although  not 
perfectly  iinear  Actually,  the  logarithm  of  the  response  plotted  against  the 
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logarithm  of  the  predictor  may  appear  more  linear  to  some,  and  less 
heteroscedastic .  As  in  evident  from  that  plot,  the  data  exhibit  rather  severe 
heterogeneity  of  variance.  The  Spearman  correlation  between  absolute 
studentized  residuals  and  predicted  values  from  an  unweighted  least  squares  fit 
is  p  0.39  with  formal  computed  significance  level  <  0.0001.  Analysis  as  in 
Carroll  &  Ruppert  (1982)  indicate  that  the  constant  coefficient  of  variation 
model  9  =  1.0  is  reasonable,  although  a  value  9  =0.9  might  be  even  better. 
For  9  =  1.0,  the  Spearman  correlation  between  absolute  studentized  residuals 
and  predicted  values  is  p  =  -0.10,  with  significance  level  0.29.  In  Figure  1, 
we  plot  kernel  regression  estimates  of  the  Anscombe  studentized  residuals, 
i.e..  the  absolute  studentized  residuals  to  the  power  2/3,  see  McCullagh  & 
Nelder  (1983).  Note  that  the  plots  indicate  that  9  =  1.0  does  a  far  better  job 
of  accounting  for  the  heteroscedasticity . 

In  these  data,  the  effect  of  not  weighting  should  be  to  have  prediction 
and  calibration  confidence  intervals  which  are  much  too  large  for  small  amounts 
of  esterase  and  conversely  for  large  amounts.  In  Figure  2  we  plot  the  95% 
prediction  intervals  for  the  count  response  for  unweighted  versus  weighted 
regression:  the  effect  is  clear.  A  similar  plot  for  the  calibration  intervals 
shows  the  same  effect:  the  unweighted  analysis  is  much  too  conservative  for 
small  amounts  of  esterase,  and  much  too  liberal  for  larger  amounts.  As 
Oppenheimer,  et  al  (1983)  state,  "Rather  dramatic  differences  have  been 
observed  depending  on  whether  a  valid  weighted  or  invalid  unweighted  analysis 
is  used”  . 

This  example  shows  that  the  actual  prediction  intervals  are  sensitive  to 
misspecif ication  of  the  variance  function.  It  should  be  clear  by  inference  and 
the  previous  section  that  one  should  make  efforts  to  estimate  the  structural 
variance  parameter  9  as  well  as  possible. 
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FIGURE  1 


The  esterase  assay  data.  This  is  a  plot  of  the  kernel 
regression  fits  to  the  Anscombe  absolute  residuals  against  the 
logarithms  of  the  predicted  values.  The  unweighted  least 
squares  fit  is  the  solid  line,  while  the  generalized  least 
squares  fit  for  the  constant  coefficient  of  variation  model  is 
the  dashed  line.  Endpoint  effects  have  been  a justed  for  by 
selective  deletion. 


Figure  2 

The  esterase  assay  data.  These  are  the  95*  prediction 
intervals  for  a  new  response.  The  dashed  line  is  unweighted 
least  squares,  while  the  solid  line  is  the  constant 
coefficient  of  variation  fit.  The  lower  part  of  the  least 
squares  interval  has  been  truncated  at  zero  where  necessary. 


